In december of 2019 the first case of a disease involving a 'strange pneumonia' (named later SARS-CoV-2) was detected in Wuhan, China. Though there were some worried voices about the detection of a new unknown disease, comparisons to past epidemics such as the H1N1 and the SARS-MERS brought comfort to leaders and were used to soothe the public opinion. As time passed, scientists started to learn more about this new disease and due to its high transmisibility it soon became clear that there was no point of comparison between the new virus and the ones before it.
By march 2020 the World Health Organization raised the alarms and declared a global pandemic. It became clear that the virus had spread faster and farther than anyone could have imagined and it was evident that decisions to put a brake on the spread were being taken too late. Leaders in Europe realized the magnitude of what they were facing and promptly decided to impose strict lockdown measures even if that meant shutting down the global economy.

After looking at the advances of the virus in it's neighboring countries, the government of UK also decided to act. The 20th of march it was announced that all public venues had to close. This measure applied to pubs, restaurants, gyms, nightclubs, theatres and cinemas. Schools and universities were also included. Three days later it was stated on a television broadcast that a general order to stay at home was inplace and that restrictions on the freedom of movement were going to be enforced by the law. British residents were adviced to stay home throughout this period except for essential purchases, essential work travel (if possible remote work had to be done), medical needs, and providing care for others. These restrictions were eased during late may and june when students were allowed to return to schools and some non essential retailers as well as public venues were allowed to start opening again.
The introduction of the lockdown had an immense effect on London. The center of the city that used to be full of tourists, public venues, social life and a vast amount of commuters had now turned to a rather lonesome place. As the inflow of tourists was severely hindered, non essential retail shops closed and commuters started to work remotely several changes in the city started to manifest.
In this notebook I present an exploratory analysis based on data at the crime event level to answer whether there was any change in the geographic distribution of crime in the greater City of London and in it's composition when Pre Lockdown, Lockdown and Post Lockdown times are compared.
The analysis is based on several sources of data. The main dataset is obtained through the custom download tool provided in the Police Data web page it provides street-level crime and outcomes, broken down by police force and 2011 lower layer super output area (LSOA). We consider the crime data involving both the City Of London Police as well as the Metropolitan Police Service. The period chosen spans from January to September of 2020, based on the rigor of the lockdown measures imposed I categorize the first quarter of the year as Pre Lockdown times, the second quarter as Lockdown and the third one as Post Lockdown.
The information is complemented through geographical data for the LSOA, MSOA (middle super output area) and the London Boroughs. This data is taken from the London Data Store Statistical GIS Boundaries. It contains several shape files that delimit the boundaries of LSOAs, MSOAs and Boroughs in the greater London Area.
Finally, I also consider data obtained through the use of the Police API regarding the number of stop and search procedures carried out by the two police forces mentioned above during the first three quarters of 2020.
All codes used to obtain the main dataset and the API information are included in the notebook Dataset.ipynb. The API data obtained consists of the counts of stop (and possibly search) procedures made on the street by police officers in each of the MSOA's composing the Greater London Area by different age groups. The requests are made through the requests module in Python and results are directly parsed from the response JSON.
I am interested in answering the following questions:
An interesting topic I want to address in the last part is to characterize if there exist any associations between types of crimes that ocurred in certain geographical aggregations and the Pre Lockdown, Lockdown and Post Lockdown periods.
After gathering the data from the different monthly files obtained by each police force, I carried out some brief data cleaning steps.
## Initial Cleaning
import pandas as pd
import numpy as np
## Read data
data = pd.read_csv("Data/DataProject.csv")
## Columns
print(list(data.columns))
## Number of observations
print("Number of observations: " + str(data.shape[0]))
## Head
data.head()
['Unnamed: 0', 'Month', 'Longitude', 'Latitude', 'Location', 'LSOA code', 'LSOA name', 'Crime type', 'Last outcome category', 'Id'] Number of observations: 897349
| Unnamed: 0 | Month | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Id | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | -0.106453 | 51.518207 | City of London | E01000916 | Camden 027B | Theft from the person | Investigation complete; no suspect identified | 0 |
| 1 | 1 | 1 | -0.117684 | 51.522003 | City of London | E01000920 | Camden 027D | Drugs | Court result unavailable | 1 |
| 2 | 2 | 1 | -0.111497 | 51.518226 | City of London | E01000914 | Camden 028B | Other theft | Unable to prosecute suspect | 2 |
| 3 | 3 | 1 | -0.111962 | 51.518494 | City of London | E01000914 | Camden 028B | Other theft | Investigation complete; no suspect identified | 3 |
| 4 | 4 | 1 | -0.113256 | 51.516824 | City of London | E01000914 | Camden 028B | Other theft | Investigation complete; no suspect identified | 4 |
Both the id and the Unnamed:0 columns have to be dropped as they don't contain useful information (were mostly used as identifiers in the raw data). Also some renaming of the columns is made to make working with them easier.
The raw data only has missing values in the Last Outcome Category variable, however this is merely used to denote that there was no outcome regarding the crime. It turns out that the missing value is in fact only used to denote the Category of the Crime Anti-social behaviour. So we can just complete it.
## Information on data
data.info()
## Drop Id columns (repeated)
data.drop(["Id","Unnamed: 0"],axis=1,inplace=True)
## Rename Columns
data.rename({"Crime type":"Crime", "Last outcome category":"Category"},axis=1,
inplace=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 897349 entries, 0 to 897348 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 897349 non-null int64 1 Month 897349 non-null int64 2 Longitude 897349 non-null float64 3 Latitude 897349 non-null float64 4 Location 897349 non-null object 5 LSOA code 897349 non-null object 6 LSOA name 897349 non-null object 7 Crime type 897349 non-null object 8 Last outcome category 576260 non-null object 9 Id 897349 non-null int64 dtypes: float64(2), int64(3), object(5) memory usage: 68.5+ MB
## Handling of missing values
# Note that missing in Category means that there was no outcome is used only
## for Anti-social behaviour
print(data[data.Category.isnull()]["Crime"].value_counts())
## Counts of unique values per crime
bfill = data.groupby("Crime").Category.nunique()
## Fill
data.fillna("No further investigation",inplace=True)
Anti-social behaviour 321089 Name: Crime, dtype: int64
## Check filled missing values by group before|after
afill = data.groupby("Crime").Category.nunique()
check = pd.merge(bfill.reset_index(),afill,on="Crime")
check
| Crime | Category_x | Category_y | |
|---|---|---|---|
| 0 | Anti-social behaviour | 0 | 1 |
| 1 | Bicycle theft | 8 | 8 |
| 2 | Burglary | 12 | 12 |
| 3 | Criminal damage and arson | 11 | 11 |
| 4 | Drugs | 12 | 12 |
| 5 | Other crime | 10 | 10 |
| 6 | Other theft | 11 | 11 |
| 7 | Possession of weapons | 11 | 11 |
| 8 | Public order | 12 | 12 |
| 9 | Robbery | 9 | 9 |
| 10 | Shoplifting | 12 | 12 |
| 11 | Theft from the person | 8 | 8 |
| 12 | Vehicle crime | 9 | 9 |
| 13 | Violence and sexual offences | 12 | 12 |
I also generate the variable that is going to be used to identify the three periods of interest Pre Lockdown, Lockdown and Post Lockdown. All text variables that represent Categories are reconverted to categorical. The Covid variable is reordered as in the real ocurrence of events and the categories of the variable related to the police force involved Location are recoded. Note that the Location name is rather an identifier of the police force.
## Create Lockdown variable
conds = [data.Month.lt(4), data.Month.lt(7), data.Month.lt(10)]
data['Covid'] = np.select(conds, ["PreLockdown","Lockdown","PostLockdown"])
## Replace Month with complete Date
data['Date'] = pd.to_datetime(["2020-" + str(x) + "-01" for x in data.Month])
## Convert categorical variables to category
for col in ['Location', 'LSOA name','Crime', 'Category', 'Covid']:
data[col] = data[col].astype('category')
## Reorder|Rename categories
data['Covid'].cat.reorder_categories(['PreLockdown', 'Lockdown', 'PostLockdown'],
inplace=True)
data['Location'].cat.rename_categories({'City of London ':'City of London',
'Metropolitan Service':'Metropolitan Service'},
inplace=True)
Finally, Crimes with counts below 10.000 are dumped into the Other crime category and results are saved in a pickle file.
data['Crime'].value_counts()
Anti-social behaviour 321089 Violence and sexual offences 170308 Vehicle crime 80685 Other theft 62788 Burglary 45472 Public order 40618 Drugs 39495 Criminal damage and arson 38346 Shoplifting 26305 Theft from the person 22206 Robbery 20357 Bicycle theft 17777 Other crime 7533 Possession of weapons 4370 Name: Crime, dtype: int64
## Lump low counts (Weapon Possesion)
need = data['Crime'].value_counts().index[-1:-3:-1]
data['Crime'] = np.where(~data['Crime'].isin(need), data['Crime'], 'Other crime')
## Delete non-using variables
del bfill, afill, check, conds, need
## Save to pickle
#pickle.dump(data, open("Data/DataClean.p", "wb"))
We start by loading in the already cleaned data from the section above. First thing to notice is that we have 6245 distinct LSOA's and all of these appear in the Metropolitan Service force records.
The so called location variable doesn't represent location but rather the police force that attended the crime. There is an overlap in certain LSOA's for which both the Metropolitan Service and the City of London Police can be in charge of the criminal investigation. The overlap is mainly in the centre of the city (what Londoners know as Zone 1).
## Modules
import pandas as pd
import numpy as np
import seaborn as sns
import pickle
import matplotlib.pyplot as plt
import re
import geopandas as gpd
import plotly.express as px
##--------------------------------------------------------------##
## Load Data
data = pickle.load(open("Data/DataClean.p", "rb" ) )
## Total Number of LSOAs
print("Total number of LSOAS: " + str(len(data['LSOA name'].unique())))
## Total Number of LSOAS by Location
print(data.groupby('Location')['LSOA name'].nunique())
## Attended by City of London Force
print("Attended by City of London police: \n")
print(data[data.Location=="City of London"]["LSOA name"].unique())
Total number of LSOAS: 6245 Location City of London 46 Metropolitan Service 6245 Name: LSOA name, dtype: int64 Attended by City of London police: ['Camden 027B', 'Camden 027D', 'Camden 028B', 'City of London 001A', 'City of London 001B', ..., 'Southwark 002A', 'Southwark 003H', 'Tower Hamlets 021D', 'Hackney 003C', 'Tower Hamlets 007C'] Length: 46 Categories (46, object): ['Camden 027B', 'Camden 027D', 'Camden 028B', 'City of London 001A', ..., 'Southwark 003H', 'Tower Hamlets 021D', 'Hackney 003C', 'Tower Hamlets 007C']
In the already cleaned dataset we have 13 different categories of crime types and 15 different outcomes related with these categories. I decided not to lump the low counts in the outcomes variable since the analysis is not going to focus that much in this variable but rather in the types of crime and its comparisons through the quarters of the year.
## Unique values of Crime
data.Crime.value_counts()
Anti-social behaviour 321089 Violence and sexual offences 170308 Vehicle crime 80685 Other theft 62788 Burglary 45472 Public order 40618 Drugs 39495 Criminal damage and arson 38346 Shoplifting 26305 Theft from the person 22206 Robbery 20357 Bicycle theft 17777 Other crime 11903 Name: Crime, dtype: int64
## Unique values of Category
data.Category.value_counts()
Investigation complete; no suspect identified 374510 No further investigation 321089 Status update unavailable 78385 Under investigation 61111 Awaiting court outcome 22881 Local resolution 22253 Court result unavailable 6672 Offender given penalty notice 5395 Offender given a caution 4299 Offender given a drugs possession warning 358 Unable to prosecute suspect 289 Formal action is not in the public interest 53 Further investigation is not in the public interest 40 Action to be taken by another organisation 11 Suspect charged as part of another case 3 Name: Category, dtype: int64
To start the analysis its important to consider how the aggregated counts of crime evolved through the year. I present the dynamics of these counts separated by police force involved and considering the aggregate. Data are aggregated for all LSOA's.
data_agg = data[['Date','Location','Crime', 'Category']]. \
groupby(['Date','Location', 'Crime']).count(). \
fillna(0)
data_agg.reset_index(inplace=True)
data_agg.rename({'Category':'Sum'}, axis=1, inplace = True)
data_agg['Date']=data_agg['Date'].dt.strftime('%B')
data_agg2 = data_agg.groupby(["Date", "Crime"]).sum().reset_index()
data_agg2['Location'] = 'Aggregate'
data_agg = pd.concat([data_agg,data_agg2])
data_agg['Date'] = data_agg['Date'].astype("category")
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September']
data_agg['Date'] = data_agg['Date'].cat.reorder_categories(months)
## Total counts showing number of crimes by type and force
sns.set_style("dark")
sns.set_palette("bright")
p_1 = sns.relplot(x='Date',y='Sum', hue="Crime",
data=data_agg,kind='line',col='Location',
style = 'Crime',
facet_kws={'sharey': False, 'sharex': True})
p_1.set_xticklabels(rotation=30)
p_1.fig.suptitle("Total Crime Counts by Police Force Involved", y = 1.1)
p_1.set_titles("{col_name}")
p_1
<seaborn.axisgrid.FacetGrid at 0x2a841245490>
Several patterns emerge.
In order to get a better sense of the changes in crime counts by month I exclude the category of Anti-social behaviour because it generates a large imbalance in the counts. Later I show a barplot in which this category is included but using a logarithmic transformation.
## Aggregate counts excluding Anti social behavior
data_focus = data_agg[(data_agg.Location == "Aggregate") &
(data_agg.Crime != "Anti-social behaviour")]
p1_a = sns.relplot(x = "Date", y = "Sum", hue = "Crime",
data=data_focus, kind = "line", palette = "Spectral")
p1_a.set_xticklabels(rotation=30)
p1_a.fig.suptitle("Total Crime Counts by Crime Type", y = 1.1)
p1_a
<seaborn.axisgrid.FacetGrid at 0x2a8351a5340>
As we can see, the period of Lockdown reflected in a decrease in almost all types of crime. Except for Bicycle theft and Drugs. During this period the total cases of the former increased slightly while for the latter there was a significant increase in april and may.
The sharpest decreases during Lockdown were seen in Theft from the person and Other theft. Though there was also a slight decrease in Violence and sexual offences. All in all crimes that were related to the flow of people into the city such as almost all forms of theft/robbery decreased in response to the strict lockdown measures.
The PostLockdown period was characterized by a notorious increase in crimes related to Violence and sexual offences way above their PreLockdown level. In general all other crimes seemed to revert to Pre Lockdown levels too, except for the Bicycle theft which kept increasing.
All of these facts are easily seen in the figure below where I plot total crime counts using a logarithmic transformation so as to take into account the notorious difference of counts by type.
## Total crimes by type comparing quarters
data_agg = data[["LSOA name", "Covid", "Crime", "Category"]].groupby(["LSOA name", "Covid", "Crime"]).count().fillna(0)
data_agg['LogCrime'] = np.log(data_agg['Category']+1)
data_agg.reset_index(inplace=True)
## By crime type means
p_4 = sns.catplot(x="Covid",y="LogCrime", hue="Crime", kind="point",
data=data_agg)
p_4.fig.suptitle("Comparison of LogCrimes by Type", y = 1.05)
p_4.set(xlabel="")
<seaborn.axisgrid.FacetGrid at 0x2a836456940>
While it is true that there are several interesting behaviours in the aggregated data for the Greater London Area there's also a lot of variation in the data at the LSOA level. In this section I explore a little more the results for this geographical disaggregation.
Results are shown using a log transformation over the counts due to the highly skewed distribution of total crimes by location.
data_agg = data[["LSOA name", "Covid", "Crime", "Location"]] \
.groupby(["LSOA name", "Covid", "Location"]).count()
data_agg.fillna(0,inplace=True)
data_agg.reset_index(inplace=True)
## Crime counts are highly skewed by LSOA
data_agg['Crime'].describe()
count 37470.000000 mean 23.948465 std 51.203699 min 0.000000 25% 0.000000 50% 0.000000 75% 37.000000 max 2271.000000 Name: Crime, dtype: float64
## Using logarithm
data_agg['LogCrime'] = np.log(data_agg['Crime']+1)
## Aggregate (both Police Forces)
data_agg2 = data_agg.groupby(["LSOA name","Covid"]).sum().reset_index()
data_agg2["LogCrime"] = np.log(data_agg2['Crime']+1)
data_agg2['Location'] = 'Aggregate'
data_agg = pd.concat([data_agg,data_agg2])
## Density plot
sns.set_palette("crest")
p_2 = sns.displot(x="LogCrime",data=data_agg,hue="Covid",alpha=0.3,
kind="kde",fill=True, col="Location", facet_kws={'sharey': False, 'sharex': False})
p_2.fig.suptitle("Distribution of Total Crime by LSOA",y=1)
p_2.set_titles("{col_name}")
<seaborn.axisgrid.FacetGrid at 0x2a83740ff70>
The highly skewed distribution observed for the observations regarding the City Of London force (even in logarithm) reflects the fact that most of the crime that this force responds to is located in a couple of LSOA's. By checking the data we note that most of the crime counts in this case happen in the LSOA's City of London 001F and City of London 001G. The first one includes tube stations such as Bank and Liverpool Street while the second is closer to the BlackFriars station. Except for these two aggregates the other locations have a relatively low count of crime cases.
The aggregate distribution closely mirrors the one of the Metropolitan Service this happens again because the high proportion of crime cases that this force has to respond to. The aggregate distribution is bimodal suggesting two types of LSOA's some with very low crime counts and others with a relatively more moderate number of ocurrences.
Overall the second mode of the distribution shifted when comparing the Pre Lockdown, Lockdown and Post Lockdown scenarios. It seems that all in all crime in LSOA's with a low to moderate number of ocurrences increased during the Lockdown and increased further in Post Lockdown when comparing with the results of the first quarter. This fact is probably related with the huge increase in Anti-social behaviour crimes ocurred during the first nine months of 2020.
## Check Percentiles: Table
def percentile(n):
def percentile_(x):
return np.percentile(x, n)
percentile_.__name__ = 'percentile_%s' % n
return percentile_
data_aux = data_agg[["Location","Covid","Crime"]][data_agg.Location == "Aggregate"].groupby(["Covid"])
funs = [np.mean,np.std]
funs.extend(list(map(percentile, [20, 40, 60, 80, 100])))
data_aux = data_aux.agg(funs)
## Average # of cases by LSOA along with Std and quantiles
data_aux
| Crime | |||||||
|---|---|---|---|---|---|---|---|
| mean | std | percentile_20 | percentile_40 | percentile_60 | percentile_80 | percentile_100 | |
| Covid | |||||||
| PreLockdown | 42.515292 | 76.159918 | 1.0 | 24.0 | 37.0 | 59.0 | 2271.0 |
| Lockdown | 51.858927 | 49.707895 | 1.0 | 34.0 | 55.0 | 82.0 | 625.0 |
| PostLockdown | 49.316573 | 63.939901 | 2.0 | 29.0 | 47.0 | 75.0 | 1311.0 |
The table above confirms that there was indeed a shift in the average crime cases when comparing Lockdown and Pre Lockdown with a very important decrease in variability. The maximum number of crime cases in a single LSOA fell from 2271 to only 625 during the Lockdown period. However, once the restrictions were lifted on average crime cases decreased but variability increased again. The maximum number of crime cases in LSOAs doubled from the Lockdown period to the Post Lockdown.
The apparent shift that ocurred in the crime distribution during Lockdown along with the decrease in variability merits some further investigation. The figure below shows how crime counts changed in the three periods studied in the 20 LSOA's with the highest number of crime cases during the first nine months of 2020.
We note that overall crime cases fell in each of these LSOA's the decrease was sharper in places close to the center of the City and those that had a high number of cases in Pre Lockdown. Namely, Westminster, Camden and Hackney. This decrease explains a part of the reduction in total variability observed. It is also important to mention that during the Post Lockdown period the crime cases in this LSOA's increased again and reached almost Pre Lockdown levels.
## Crime cases in 20 LSOAS with Most Crime
check_data = data[["Covid","LSOA name","Location","Crime"]].groupby(["Covid","Location",
"LSOA name"]).count().fillna(0)
check_data.reset_index(inplace=True)
data_agg = check_data.groupby(['LSOA name']).sum()
cats = data_agg.sort_values(['Crime'],
ascending=False).head(20).reset_index()
data_agg = check_data[check_data["LSOA name"].isin(cats["LSOA name"].unique())]
data_agg["LSOA name"].cat.remove_unused_categories(inplace=True)
data_agg['LogCrime'] = np.log(data_agg['Crime']+1)
order = data_agg[data_agg.Covid == "PreLockdown"].groupby("LSOA name").\
sum().sort_values("LogCrime",ascending=False)
p_3 = sns.catplot(y="LSOA name",x="LogCrime",hue="Covid", ci=None,
kind="bar",data=data_agg, order=order.index.astype(str),
alpha=0.5, palette=["b", "r", "g"])
p_3.fig.suptitle("Crime cases in the 20 LSOA'S with most crime",y=1)
p_3
<ipython-input-21-c9a47000a718>:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_agg['LogCrime'] = np.log(data_agg['Crime']+1)
<seaborn.axisgrid.FacetGrid at 0x2a83581e490>
data_agg = data[["LSOA name", "Covid", "Crime", "Category"]].groupby(["LSOA name", "Covid", "Crime"]).count().fillna(0)
data_agg['LogCrime'] = np.log(data_agg['Category']+1)
data_agg.reset_index(inplace=True)
## Aggregating all Areas by Crime Type: Boxen
sns.set_palette(["#0000b3", "#cc0000", "#339933",])
p_5 = sns.catplot(x="Covid",y="LogCrime", col="Crime", kind="boxen",
data=data_agg, col_wrap=3)
p_5.fig.suptitle("Boxen plots for all types of crime comparing time period",y=1.1)
Text(0.5, 1.1, 'Boxen plots for all types of crime comparing time period')
Several conclusions can be obtained from the boxen plots for the distribution of crimes by LSOA.
The figure below illustrates outcome categories for the different crime events analyzed in the sections above. We note that regarding the categories there doesn't seem to be any huge difference in the results when facetting by Pre, Post and regular Lockdown. Perhaps the only interesting difference is that less crimes were further investigated during the Lockdown period. Nevertheless, this difference just stems from the fact that more Anti-social behaviour crimes ocurred during this period.
We note that no crimes that ocurred during Pre Lockdown times are still under investigation. However, there are lot of cases that ocurred during this period that have an unavailable court result.
data_agg = data[["Covid", "Crime", "Category"]].groupby(["Covid","Category"]).\
count().fillna(0).reset_index()
## Lump counts
counts = data_agg.groupby("Category").sum().sort_values("Crime",ascending=False)
data_agg["Category"] = np.where(data_agg.Category.isin(counts.index[-1:-7:-1]),
"Other", data_agg.Category)
## Aggregate and order
data_agg = data_agg.groupby(["Covid","Category"]).sum().reset_index()
data_agg["LogCrime"] = np.log(data_agg["Crime"]+1)
order = data_agg.groupby(["Category"]).sum().Crime.sort_values(ascending=False).index.astype(str)
p_6 = sns.barplot(x="LogCrime",y="Category", hue="Covid", order=order, data=data_agg)
p_6.set_title("Crime Outcomes by period")
Text(0.5, 1.0, 'Crime Outcomes by period')
In this section I research more on the geographical distribution of crime dynamics by plotting maps at both the MSOA and the Borough level.
## Load Crime Data
data = pickle.load(open("Data/DataClean.p", "rb" ))
## Information on LSOA
data_LSOA = gpd.read_file("Data/LSOA/LSOA_2011_London_gen_MHW.shp")
data_LSOA = data_LSOA.to_crs(epsg=4326)
## Information on MSOA
data_MSOA = gpd.read_file("Data/MSOA/MSOA_2011_London_gen_MHW.shp")
data_MSOA = data_MSOA.to_crs(epsg=4326)
data_MSOA.head()
| MSOA11CD | MSOA11NM | LAD11CD | LAD11NM | RGN11CD | RGN11NM | USUALRES | HHOLDRES | COMESTRES | POPDEN | HHOLDS | AVHHOLDSZ | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | E02000001 | City of London 001 | E09000001 | City of London | E12000007 | London | 7375 | 7187 | 188 | 25.5 | 4385 | 1.6 | MULTIPOLYGON (((-0.10414 51.50841, -0.10444 51... |
| 1 | E02000002 | Barking and Dagenham 001 | E09000002 | Barking and Dagenham | E12000007 | London | 6775 | 6724 | 51 | 31.3 | 2713 | 2.5 | POLYGON ((0.14809 51.59678, 0.14806 51.59640, ... |
| 2 | E02000003 | Barking and Dagenham 002 | E09000002 | Barking and Dagenham | E12000007 | London | 10045 | 10033 | 12 | 46.9 | 3834 | 2.6 | POLYGON ((0.15063 51.58306, 0.14838 51.58075, ... |
| 3 | E02000004 | Barking and Dagenham 003 | E09000002 | Barking and Dagenham | E12000007 | London | 6182 | 5937 | 245 | 24.8 | 2318 | 2.6 | POLYGON ((0.18508 51.56480, 0.18400 51.56391, ... |
| 4 | E02000005 | Barking and Dagenham 004 | E09000002 | Barking and Dagenham | E12000007 | London | 8562 | 8562 | 0 | 72.1 | 3183 | 2.7 | POLYGON ((0.14988 51.56807, 0.15076 51.56778, ... |
## Correspondence table between LSOA and MSOA
match_table = data_LSOA[["LSOA11CD","MSOA11NM"]]
## Unique LSOA codes in police dataset
codes = data["LSOA code"].unique()
ids = pd.Series(codes).isin(match_table["LSOA11CD"])
## Codes that can't be matched:
## There are some weird codes but they don't seem to belong to London (drop them)
data[~data["LSOA code"].isin(codes[ids])]["LSOA name"].unique()
## Keep data that can be matched with LSOA polygons
use_data = data[data["LSOA code"].isin(codes[ids])]
use_data = pd.merge(use_data, match_table, how="left", left_on=["LSOA code"],
right_on=["LSOA11CD"])
## Calculate counts that are going to appear in the map
ccounts = use_data[["MSOA11NM","Covid", "Crime"]].groupby(["MSOA11NM", "Covid"]).\
count().fillna(0)
ccounts.reset_index(inplace=True)
ccounts["CrimeVariation(%)"] = ccounts.groupby("MSOA11NM").\
transform(lambda x: np.append([np.nan], 100*(np.log(x[1:].values)-np.log(x[:-1].values))))
ccounts["CrimeChange"] = ccounts[["MSOA11NM", "Crime"]].groupby("MSOA11NM").\
transform(lambda x: np.append([np.nan], (x[1:].values)-x[:-1].values))
## Merge geometry with counts
data_map = data_MSOA.merge(ccounts[ccounts.Covid != "PreLockdown"], how="right")
data_map.set_index(data_map.MSOA11CD, inplace=True)
## Get Georeference json to plot
geo_data = data_map[["geometry"]].__geo_interface__
## Plot JSON + Counts
fig = px.choropleth(data_map[["MSOA11CD","MSOA11NM","CrimeChange", "CrimeVariation(%)", "Covid"]],
geojson=geo_data, locations = "MSOA11CD", color='CrimeVariation(%)',
color_continuous_scale=px.colors.sequential.Blackbody[::-1],
facet_col="Covid", hover_name="MSOA11NM",
hover_data ={'MSOA11CD':False, 'CrimeVariation(%)':':.2f',
'CrimeChange':':.2f', 'Covid':False})
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(title="Approximate variation in crimes by MSOA")
#fig.write_html("Mapa1.html")
fig